371 research outputs found
A Conserative Property of a Nested Relational Query Language
We proposed in [7] a nested relational calculus and a nested relational algebra based on structural recursion [6,5] and on monads [27,16]. In this report, we describe relative set abstraction as our third nested relational query language. This query language is similar to the well known list comprehension mechanism in functional programming languages such as Haskell [ll], Miranda [24], KRC [23], etc. This language is equivalent to our earlier query languages both in terms of semantics and in terms of equational theories. This strong sense of equivalence allows our three query languages to be freely combined into a nested relational query language that is robust and user-friendly
Controlling False Positives in Association Rule Mining
Association rule mining is an important problem in the data mining area. It
enumerates and tests a large number of rules on a dataset and outputs rules
that satisfy user-specified constraints. Due to the large number of rules being
tested, rules that do not represent real systematic effect in the data can
satisfy the given constraints purely by random chance. Hence association rule
mining often suffers from a high risk of false positive errors. There is a lack
of comprehensive study on controlling false positives in association rule
mining. In this paper, we adopt three multiple testing correction
approaches---the direct adjustment approach, the permutation-based approach and
the holdout approach---to control false positives in association rule mining,
and conduct extensive experiments to study their performance. Our results show
that (1) Numerous spurious rules are generated if no correction is made. (2)
The three approaches can control false positives effectively. Among the three
approaches, the permutation-based approach has the highest power of detecting
real association rules, but it is very computationally expensive. We employ
several techniques to reduce its cost effectively.Comment: VLDB201
A Bounded Degree Property and Finite-Cofiniteness of Graph Queries
We provide new techniques for the analysis of the expressive power of query languages for nested collections. These languages may use set or bag semantics and may be further complicated by the presence of aggregate functions. We exhibit certain classes of graphics and prove that properties of these graphics that can be tested in such languages are either finite or cofinite. This result settles that conjectures of Grumbach, Milo, and Paredaens that parity test, transitive closure, and balanced binary tree test are not expressible in bah languages like BALG of Grumbach and Milo and BQL of Libkin and Wong. Moreover, it implies that many recursive queries, including simple ones like test for a chain, cannot be expressed in a nested relational language even when aggregate functions are available. In an attempt to generalize the finite-cofiniteness result, we study the bounded degree property which says that the number of distinct in- and out-degrees in the output of a graph query does not depend on the size of the input if the input is simple. We show that such a property implies a number of inexpressibility results in a uniform fashion. We then prove the bounded degree property for the nested relational language
Relational Foundations For Functorial Data Migration
We study the data transformation capabilities associated with schemas that
are presented by directed multi-graphs and path equations. Unlike most
approaches which treat graph-based schemas as abbreviations for relational
schemas, we treat graph-based schemas as categories. A schema is a
finitely-presented category, and the collection of all -instances forms a
category, -inst. A functor between schemas and , which can be
generated from a visual mapping between graphs, induces three adjoint data
migration functors, -inst-inst, -inst -inst, and -inst -inst. We present an algebraic query
language FQL based on these functors, prove that FQL is closed under
composition, prove that FQL can be implemented with the
select-project-product-union relational algebra (SPCU) extended with a
key-generation operation, and prove that SPCU can be implemented with FQL
Comparative analysis and assessment of M. tuberculosis H37Rv protein-protein interaction datasets
10.1186/1471-2164-12-S3-S2010th Int. Conference on Bioinformatics - 1st ISCB Asia Joint Conference 2011, InCoB 2011/ISCB-Asia 2011: Computational Biology - Proceedings from Asia Pacific Bioinformatics Network (APBioNet)12SUPPL.
Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes
Complexes of physically interacting proteins constitute fundamental
functional units responsible for driving biological processes within cells. A
faithful reconstruction of the entire set of complexes is therefore essential
to understand the functional organization of cells. In this review, we discuss
the key contributions of computational methods developed till date
(approximately between 2003 and 2015) for identifying complexes from the
network of interacting proteins (PPI network). We evaluate in depth the
performance of these methods on PPI datasets from yeast, and highlight
challenges faced by these methods, in particular detection of sparse and small
or sub- complexes and discerning of overlapping complexes. We describe methods
for integrating diverse information including expression profiles and 3D
structures of proteins with PPI networks to understand the dynamics of complex
formation, for instance, of time-based assembly of complex subunits and
formation of fuzzy complexes from intrinsically disordered proteins. Finally,
we discuss methods for identifying dysfunctional complexes in human diseases,
an application that is proving invaluable to understand disease mechanisms and
to discover novel therapeutic targets. We hope this review aptly commemorates a
decade of research on computational prediction of complexes and constitutes a
valuable reference for further advancements in this exciting area.Comment: 1 Tabl
A Performance Study of Three Disk-based Structures for Indexing and Querying Frequent Itemsets
Proceedings of the VLDB Endowment67505-51
CAMBer: an approach to support comparative analysis of multiple bacterial strains
10.1109/BIBM.2010.5706549Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010121-12
A Flexible Approach to Finding Representative Pattern Sets
10.1109/TKDE.2013.27IEEE Transactions on Knowledge and Data Engineerin
- …